Goto

Collaborating Authors

 sequential data


How Patterns Dictate Learnability in Sequential Data

Neural Information Processing Systems

Sequential data--ranging from financial time series to natural language--has driven the growing adoption of autoregressive models. However, these algorithms rely on the presence of underlying patterns in the data, and their identification often depends heavily on human expertise. Misinterpreting these patterns can lead to model misspecification, resulting in increased generalization error and degraded performance. The recently proposed evolving pattern (EvoRate) metric addresses this by using the mutual information between the next data point and its past to guide regression order estimation and feature selection. Building on this idea, we introduce a general framework based on predictive information--the mutual information between the past and the future, I(Xpast;Xfuture). This quantity naturally defines an information-theoretic learning curve, which quantifies the amount of predictive information available as the observation window grows. Using this formalism, we show that the presence or absence of temporal patterns fundamentally constrains the learnability of sequential models: even an optimal predictor cannot outperform the intrinsic information limit imposed by the data. We validate our framework through experiments on synthetic data, demonstrating its ability to assess model adequacy, quantify the inherent complexity of a dataset, and reveal interpretable structure in sequential data.


Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

Neural Information Processing Systems

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.






the four main areas of criticism below (reviewers referred to as R1-5)

Neural Information Processing Systems

We first thank the reviewers for their insightful comments which we have taken into careful consideration. If our work were to be evaluated using only performance metrics, this criticism would be fair. Learning paradigms for networks of'convex layers' have been shown to be effective (e.g. The key advance over standard SCNs is that we show how to perform non-linear computations in these systems. Standard SCNs such as in Boerlin et al (2013) are restricted to linear computations. It may seem surprising, but such layers are actually not well understood!



Appendix 1

Neural Information Processing Systems

Pi,jCi,j γH(P) subjectto P Rt t+,PT1t =1t,P1t =1t, (6) where Pi,j is the transport plan andCi,j is the ground metric that measures the distance between point i in the source andj in the target. This will induce some smoothness and wiggle room in the solutionofourobjective. To increase the diversity of the observed trajectories, we inject Gaussian noise (σ = 0.05) into trajectories by perturbing the initial velocities. Since two-body systems are non-chaotic systems, we divide training set and testing set such that for training set[mmin,mmax] = [0.8,1.2], while testing set[mmin,mmax] = [0.9,1.3] to create domain distribution shifting. The initial velocities of all bodies are based on their initial positions by rotating itby 90 andscalingitbyr1.5.


PCF-GAN: generating sequential data via the characteristic function of measures on the path space

Neural Information Processing Systems

Generating high-fidelity time series data using generative adversarial networks (GANs) remains a challenging task, as it is difficult to capture the temporal dependence of joint probability distributions induced by time-series data. Towards this goal, a key step is the development of an effective discriminator to distinguish between time series distributions. We propose the so-called PCF-GAN, a novel GAN that incorporates the path characteristic function (PCF) as the principled representation of time series distribution into the discriminator to enhance its generative performance. On the one hand, we establish theoretical foundations of the PCF distance by proving its characteristicity, boundedness, differentiability with respect to generator parameters, and weak continuity, which ensure the stability and feasibility of training the PCF-GAN. On the other hand, we design efficient initialisation and optimisation schemes for PCFs to strengthen the discriminative power and accelerate training efficiency. To further boost the capabilities of complex time series generation, we integrate the auto-encoder structure via sequential embedding into the PCF-GAN, which provides additional reconstruction functionality. Extensive numerical experiments on various datasets demonstrate the consistently superior performance of PCF-GAN over state-of-the-art baselines, in both generation and reconstruction quality.